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SECTION  I 


INTRODUCTION 


In  1972  and  early  1973,  two  Design  Verification  Models  of  the 
Remote-Terminal  Emulator  were  developed  by  The  MITRE  Corporation 
under  the  sponsorship  of  the  Air  Force  Directorate  of  Automated  Data 
Processing  Equipment  Selection  (MCS)*.  The  fixed-site  system,  also 
referred  to  as  the  lab  system,  was  installed  at  MITRE/Bedf ord  - ini- 
tially in  H-Building  and  later  in  D-Building.  The  on-site,  or  field- 
test  system,  was  initially  installed  at  MITRE/Bedf ord  and  later  moved 
to  the  following  locations:  Air  Force  Cambridge  Research  Laboratories 

(AFCRL)  at  Hanscom  AFB,  Bedford,  Mass;  Rome  Air  Development  Center 
(RADC)  at  Griffiss  AFB  at  Rome,  New  York;  Air  Force  Data  Service  Center 
(AFDSC)  at  the  Pentagon,  and  National  Security  Agency  (NSA)  at  Fort 
George  G.  Meade,  Maryland.  In  early  1975,  the  field-test  system  was 
moved  to  Hanscom  AFB,  Bedford,  Massachusetts. 

Throughout  this  development  and  test  period,  there  appeared  to 
be  an  abnormal  amount  of  down-time  due  to  various  hardware  failures. 
Since  further  test  and  experimental  uses  are  planned  for  both  the  lab 
and  field-test  systems,  it  was  decided  to  conduct  an  investigation 
of  the  problems  to  date  in  the  hope  that  they  could  be  minimized  in 
the  future. 

In  this  report,  we  review  hardware  failures  that  have  occurred 
in  these  systems,  determine  if  they  were  abnormally  high  for  this 
class  of  equipment,  and  develop  recommendations  to  help  improve  the 
overall  availability  of  these  systems  in  the  future. 

The  results  of  a survey  of  hardware  failures  in  RTE  systems  up 
to  December  1974  are  presented  in  Section  II.  In  the  case  of  the 
fixed  site  system,  whose  components  are  shared  with  the  Data  Handling 
Laboratory,  the  reported  failures  pertain  only  to  RTE  components. 

Section  III  describes  a theoretical  reliability  model  of  RTE 
subsystems,  using  part  population  counts,  with  each  part  reliability 
as  specified  in  MIL  standards.  A comparison  of  the  actual  results 
and  those  predicted  by  the  reliability  model  is  presented  in 
Section  IV,  including  individual  discussions  for  each  subsystem. 


*D.L.  James  and  D.W.  Lambert  Remote-Terminal  Emulator  (Design 
Verification  Model)  - Introduction  and  Summary,”  ESD-TR-74-372 , 
Electronic  Systems  Division,  Air  Force  Systems  Command,  Hanscom  AFB, 
Bedford,  Massachusetts,  February  1975. 
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The  major  problem  areas,  as  well  as  potential  corrective  actions 
are  discussed  in  Section  V.  Finally,  considerations  about  future 
failure  documentation  is  presented  in  Section  VI. 

Appendix  I includes  the  actual  survey  of  RTE  failure  in  chrono- 
logical order,  while  Appendix  II  includes  the  detailed  calculations 
of  the  predicted  failure  rate  for  each  subsystem. 
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SECTION  II 


FAILURE  SURVEY 


DEFICIENCY  CONCEPT 

In  attempting  to  list  the  failures  of  any  type  of  equipment,  it 
is  important  to  define  precisely  which  type  of  failures  will  be  in- 
cluded in  the  survey.  The  range  could  extend  from  a single  bit  loss 
in  a communication  link  to  total  system  failure.  For  this  survey, 
the  decision  about  listing  a particular  failure  was  done  on  the  basis 
of  the  type  of  deficiency  that  caused  it. 

According  to  Military  Handbook  217A  Reliability  Stress  and 
Failure  Rate  Data  for  Electronic  Equipment,  11  a deficiency  is  defined 
as  a possible  cause  of  failure  because  a part,  equipment  or  system 
lacks  some  quality  necessary  to  function  according  to  specifications. 
In  other  words,  a deficiency  is  the  cause  of  a failure;  therefore, 
every  failure  has  at  least  one  associated  deficiency.  It  is  pos- 
sible to  have  two  deficiencies  as  the  cause  of  a specific  failure 
and  correcting  one  may  not  solve  the  problem  or,  on  the  other  hand, 
a deficiency  may  be  the  cause  of  several  failures. 

Deficiencies  can  exist  in  hardware  and  yet  never  manifest  them- 
selves as  failures  because  the  equipment  is  not  exercised  environ- 
mentally and  functionally  to  the  level  which  causes  failure. 

Failure  rate  is  the  frequency  per  unit  time  that  a deficiency 
is  manifested  as  a failure.  Correcting  or  eliminating  a deficiency 
has  a direct  effect  on  failure  rate.  Therefore,  equipment  failure 
rate  can  be  improved  either  by  eliminating  deficiencies  or  by  re- 
ducing the  probability  of  deficiencies  manifesting  themselves  as 
failures . 

Deficiencies  can  be  classified  in  the  following  types: 

a)  Initial  deficiencies 

b)  Component  malfunctions 

c)  Introduced  deficiencies 

The  first  type  includes  all  design,  fabrication  and  installation  de- 
ficiencies. They  normally  cause  failures  during  the  initial  set-up 
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of  the  system.  Failures  caused  by  those  deficiencies  were  not  in- 
cluded in  this  survey  unless  the  failures  occurred  much  later  than 
the  original  installation. 

The  second  type  includes  all  component  failures,  such  as  inte- 
grated circuits,  resistors,  contacts,  motors,  etc.  Assuming  perfect 
servicing  of  the  equipment  and  complete  elimination  of  the  deficiency, 
these  are  the  only  failures  that  should  occur  in  the  operation  of 
the  equipment.  Any  theoretical  reliability  model  predicting  mean 
time  between  failures  will  be  based  on  this  type  of  failure  only. 

In  order  to  compare  the  predicted  and  actual  failure  counts  in  the 
following  sections,  these  failures  were  specifically  identified  in 
the  survey. 

The  third  type  of  deficiency  consists  of  those  introduced  either  bv 
attempting  to  correct  a deficiency  of  another  type  or,  since  we  are 
dealing  with  several  interactive  subsystems,  by  other  subsystem 
failures.  These  failures  will  be  accounted  separately. 


SURVEY  RESULTS 

The  following  survey  includes  the  hardware  failures  of  the  RTE 
Design  Verification  Models  (both  field  test  and  lab  system)  during 
the  period  July  1972  to  December  1974.  In  the  case  of  the  lab 
system,  whose  hardware  is  shared  with  the  Data  Handling  Laboratory, 
the  recorded  failures  are  the  ones  related  to  RTE  components  ex- 
clusively. 

Table  I presents  a summary  of  results,  including  the  number  of 
failures  in  each  individual  subsystem  for  the  specified  period. 

Only  two  types  of  failures  were  considered: 

1)  Failures  due  to  component  malfunctions. 

2)  Failures  due  to  introduced  deficiencies. 

The  actual  survey  is  presented  in  Appendix  A where  each  entry 
contains  a failure  description,  the  date  of  occurrence  and  the 
failure  classification  (due  to  a component  malfunction  or  an  intro- 
duced deficiency) . 

It  should  be  pointed  out  that  if  a failure  recurred  a short 
period  after  the  deficiency  was  presumably  corrected,  it  was  counted 
as  a single  failure.  Also,  if  a failure  due  to  an  initial  deficiency 
occurred  much  later  than  the  original  installation,  it  was  accounted 
as  an  introduced  deficiency. 
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Table  I 


Summary  of  Failure  Survey  Results 


System 

Subsystem 

Due  to  Components 
Malfunctions 

Due  to  Introduced 
Deficiencies 

Central  Processor  & Operator  Panel 

2 

1 

Memory 

4 

10 

Disk 

1 

- 

Mag  Tape 

6 

2 

Printer 

2 

3 

Teletype 

G 

4 

Lab 

Paper  Tape  Reader 

- 

- 

Readable  Real  Time  Clock  and 
Interval  Timer 

2 

— 

Card  Reader 

1 

1 

ALU 

1 

1 

Modems 

1 

14 

Power  Supply 

5 

3 

TOTAL 

31 

37 

Central  Processor  & Operator  Panel 

2 

1 

Memory 

10 

6 

Disk 

3 

3 

Mag  Tape 

4 

4 

Printer 

1 

3 

Teletype 

2 

1 

Field 

Paper  Tape  Reader 

- 

1 

Test 

Readable  Real  Time  Clock  and 
Interval  Timer 

1 

- 

ALU 

2 

1 

Digital  I/O 

1 

- 

Power  Supply 

5 

3 

TOTAL 

31 

23 
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Sources  for  this  survey  included  the  Mini-Computer  Facility 
report,  Data  General  Field  Service  bills  and  Air  Force  reports. 
The  survey  results  were  also  correlated  with  the  installation 
log  book  for  a one  month  period  randomly  chosen,  for  the  purpose 
of  determining  if  it  included  failures  not  reported  elsewhere. 

No  additional  failures  were  found. 
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SECTION  III 


RELIABILITY  MODEL 


GENERAL 


In  order  to  determine  if  the  actual  failure  rate  for  each  version 
of  the  DVM  was  abnormal,  it  was  decided  to  compare  it  against  the  re- 
sults predicted  by  a theoretical  reliability  model  based  on  stress 
factors  and  part  population,  as  specified  in  Military  Handbook  217A. 
This  failure  prediction  method  was  chosen  for  two  reasons: 

1)  The  abundance  of  data  on  parts  failure  rates 

2)  Some  work  by  Data  General  using  the  same  model, 
which  simplified  the  gathering  of  data. 

Basically,  the  model  is  based  on  the  fact  that  the  failure  rate  of  a 
subsystem  is  dependent  on  the  failure  rate  of  each  individual  part, 
such  as  resistors,  capacitors,  integrated  circuits,  contacts,  soldered 
connections,  etc. 


Assuming  n components,  the  probability  of  no  failures  in  a time 
t is  as  follows: 

p (t)  = p.  (t)  . p.  (t)  ...p.  (o...p  (t) 

I Z in 

where  P,  (t)  = Probability  of  no  failure  in  rth  part. 

Assuming  a Poisson  distribution  for  failure  arrivals,  the 
probability  of  no  failures  in  the  ith  part  during  time  t_  is: 

p.  (t) . e-ut 
1 

where  Ai  - failures  per  unit  time  of  ith  component. 


The  probability  of  no  failures  for  the  total  subsystem  then 
becomes : 


P 


(t) 


-Alt 

= e 


-A2 1 
e 


-Xnt  -Xt 
e = e 


where  A = A i + A2  + . . . + An  = failure  rate  of  subsystem. 


i.e.,  the  failure  rate  for  the  subsystem  is  equal  to  the  sum  of  the 
failure  rates  of  all  individual  parts.  Similarly,  the  total  system 
failure  rate  is  equal  to  the  sum  of  the  individual  subsystems  failure 
rates . 
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In  conclusion,  if  the  failure  rate  for  each  individual  part  is 
known,  the  probability  of  failure  for  the  total  system  can  be  de- 
termined. The  next  paragraph  will  deal  with  part  failures,  the 
following  one  will  include  failure  rate  predictions  for  each  sub- 
system such  as  central  processor,  memory,  ALU,  etc.,  and  finally, 
total  system  considerations  will  be  presented. 

It  should  be  pointed  out  that  all  the  failure  rates  predicted 
in  these  equations  are  for  an  optimum  operating  temperature. 


FAILURE  RATE  FOR  PARTS 

According  to  MIL-HDBK-217A,  there  are  basically  three  steps  in 
predicting  parts  failure  rates  as  follows: 

a)  Determination  of  Stresses  for  Each  Part 

The  stresses  to  be  considered  are  the  ones  associated 
with  the  cause  of  the  principal  modes  of  failure  of 
the  part,  e.g. , conditions  such  as  power  dissipation 
for  resistors,  voltages  for  capacitors,  etc.  The 
result  is  expressed  as  a ratio  of  actual  stress  to 
the  rated  stress  which  is  the  stated  military  rating 
for  the  part  working  under  nominal  conditions.*  All 
the  applicable  stress  factors  are  shown  in  the  first 
column  of  Table  II  and  are  as  provided  by  Data  General 
in  their  reliability  Report  for  NOVA  Minicomputers. 

Where  no  stress  is  applicable  or  it  was  not  known, 
the  ratio  was  assumed  to  be  1.  (i.e.,  the  part  was 

assumed  to  work  under  nominal  conditions.) 

b)  Determination  of  Basic  Failure  Rate 

Using  MIL-HDBK-217A  and  the  stress  ratio,  determine 
the  basic  failure  rate  of  each  part.  The  results 
are  presented  in  the  second  column  of  Table  II. 

They  represent  the  failure  rate  for  the  part  under 
controlled  test  conditions,  which  usually  differs 
from  failure  rates  for  the  part  when  used  in  an 


* Nominal  conditions  are  understood  to  be  a 1/2  watt  resistor 
dissipating  1/2  watt,  a 12  volt  capacitor  working  at 
12  volts,  etc. 
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Parts  Failure  Rates 
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(1)  As  calculated  by  Data  General. 

(2)  From  MIL-HDBK  217A,  unless  otherwise  noted. 

(3)  Semiconductor  manufacturers  data  (as  supplied  by  Data  General). 

(4)  Based  on  IBM  operational  data  for  the  4 Pi  series  of  Computers. 

(5)  Failure  rate  calculated  from  base  failure  rate  of  10  plus  increase  of  0.  025  for  every  mating  (80  matings  assumed). 


equipment.  The  application  K-f actors  take  this 
into  account. 

c)  Determination  of  Application  K-factors 

To  take  into  account  the  application  environment  for 
the  part,  it  is  necessary  to  multiply  its  basic  failure 
rate  by  a factor  dependent  on  the  intended  use. 

MIL-HDBK-217A  provides  K-factors  for  each  part  type 
and  category  of  equipment,  namely,  fixed  ground, 
vehicle  mounted  ground,  shipboard,  airborne,  etc. 

Table  II  shows  the  application  K-factors  for  three 
environments:  (a)  fixed  ground,  to  be  used  as  a 

reference;  (b)  RTE  lab  system,  which  was  estimated 
to  be  larger  than  fixed  ground  due  to  the  frequent 
configuration  changes  and  updates  and  (c)  RTE  field 
test  system,  which  was  estimated  at  roughly  75%  of 
vehicle  mounted  ground,  to  take  into  account  all 
the  moves  to  different  field  locations. 

The  failure  rate  for  each  part  type  and  class  is  finally  shown 
in  Table  II  and  it  was  obtained  by  multiplying  the  basic  failure 
rate  and  the  corresponding  K-factor.  When  the  K-factor  was  not 
known  it  was  assumed  to  be  1. 

Subsystem  Failure  Rate 

The  subsystem  failure  rate  is  the  summation  of  the  failure 
rates  for  the  individual  parts.  The  procedure  that  was  followed 
was  to  obtain  a part  population  count  (i.e.,  how  many  resistors, 
how  many  integrated  circuits,  etc.)  and  then  multiply  the  failure 
rate  for  that  part  by  its  population.  The  addition  of  those  values 
is  the  predicted  failure  rate  for  the  subsystem. 

Parts  population  counts  for  the  Central  Processor,  Operator 
Panel,  Power  Supply  and  Memory  Modules  were  supplied  by  Data  General. 

For  the  peripheral  controllers,  Digital  Computer  Controls 
Asynchronous  Line  Units,  Digital  I/O,  Real  Time  Clock  and  Interval 
Timer,  the  parts  counts  were  estimated  from  circuit  diagrams  and 
visual  observation  of  the  boards. 

For  peripheral  devices,  the  failure  rate  was  estimated  based 
on  their  mechanical  complexity  since  no  formal  method  is  available. 

The  detailed  part  counts  and  subsystem  failure  rate  calculations 
are  presented  in  Appendix  II.  A summary  of  the  results,  including 
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the  system  failure  rate  (the  addition  of  the  individual  system 
failures)  and  its  reciprocal,  the  MTBF  (Mean  Time  Between  Failures) 
is  presented  in  Table  III. 
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Table  III 


System  Failure  Rates 


System 

Subsystem 

Failures  Per  Million  Hours 

No. 

Units 

Failure 
Rate 
Per  Unit 

Failure 

Rate 

Central  Processor  & Operator  Panel 

1 

216 

216 

Memory 

5 

103 

515 

Disk 

1 

300 

300 

Mag  Tape 

1 

500 

500 

Printer 

1 

500 

500 

Paper  Tape  Reader 

1 

200 

200 

Readable  Real  Time  Clock 

Lab^1) 

and  Interval  Timer 

1 

42 

42 

Card  Reader 

1 

200 

200 

ALU 

1 

175 

175 

Power  Supply 

1 

58 

58 

System  Failure  Rate 

2706 

Mean  Time  Between  Failures  = 10^/2706  = 369  hours 

Central  Processor  & Operator  Panel 

1 

415 

415 

Memory 

6 

174 

1044 

Disk 

1 

450 

450 

Mag  Tape 

1 

750 

750 

Printer 

1 

750 

750 

Paper  Tape  Reader 

1 

300 

300 

Readable  Real  Time  Clock 

Field 

and  Interval  Timer 

1 

69 

69 

Test(2) 

ALU 

8 

177 

1416 

Digital  I/O  Board 

12 

60 

720 

Digital  I/O  Terminator 

1 

62 

62 

Power  Supply 

2 

138 

276 

System  Failure  Rate 

6252 

Mean  Time  Between  Failures  = 10^/6252  = 160  hours 

(1)  Not  including  Teletype  and  Modems. 

(2)  Not  including  Teletype. 
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SECTION  IV 


COMPARISON  OF  RESULTS 


SYSTEM  RELIABILITY 

Table  IV  presents  a comparison  of  the  predicted  failure  rate 
and  the  actual  experience  with  the  DVM's.  The  first  column  lists 
the  number  of  weeks  each  component  was  in  operation.  Under  the 
assumption  that  the  lab  system  was  powered  up  an  average  of  55  hours 
a week  and  the  field  test  an  average  of  45  hours  a week,  the  op- 
erating hours  were  calculated  and  listed  in  the  second  column. 

The  actual  failure  rates  were  calculated  by  dividing  the  num- 
ber of  failures  by  the  power-up  hours.  It  should  be  pointed  out 
that  this  may  not  be  correct  for  some  subsystems  such  as  the  card 
reader  or  paper  tape  reader,  but  since  their  contribution  to  total 
failure  rate  were  minimal,  and  for  consistency,  all  subsystems 
were  assumed  to  be  powered-up  all  the  time. 

In  considering  the  reliability  and  actual  performance  of  the 
total  system,  the  most  used  parameter  is  the  MTBF  or  Mean  Time 
Between  Failures,  which  is  inversely  proportional  to  the  failure 
rate . 


For  the  lab  system,  without  including  teletype  and  modems,  the 
actual  MTBF  due  to  component  malfunctions  was  277  hours,  which 
should  be  compared  with  the  estimated  369  hours  as  expected  from 
the  reliability  model.  If  the  teletype  and  modems  are  included, 
the  MTBF  is  reduced  to  215  hours,  and  if  introduced  deficiencies  are 
taken  into  account  as  failures,  the  MTBF  is  102  hours. 

For  the  field  test  system,  the  predicted  MTBF  was  160  hours; 
in  actual  operation  it  was  130  hours  and  if  introduced  deficiencies 
are  included,  the  actual  MTBF  is  reduced  to  70  hours. 

All  these  values  imply  that  the  expected  MTBF  correlates  with 
the  actual  experience  for  the  field  test  system  and  to  a lesser  ex- 
tent with  the  lab  system.  If  the  introduced  deficiencies  are  in- 
cluded, however,  there  appear  to  be  many  more  failures  than  expected. 

Two  main  results  can  be  observed  by  looking  at  Table  IV,  namely: 
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Table  IV 


Actual  and  Expected  Failure  Rate 


No.  of 
Weeks  in 
Operation 

Actual 

Predicted 

System 

Subsystem 

Operating 
Hours  * 

No.  of 
Failures 

Introduced 

Failures 

Failure 
Rate  Per 
106  Hours 

Failure 
Rate  Per 
106  Hours 

Central  Processor  & 

Operator  Panel 

121 

6655 

2 

1 

301 

216 

Memory 

121 

6655 

4 

10 

601 

515 

Disk 

121 

6655 

1 

0 

150 

300 

Mag  Tape 

104 

5720 

6 

2 

1049 

500 

Printer 

121 

6655 

2 

3 

301 

500 

Teletype 

121 

6655 

6 

4 

902 

- 

Paper  Tape  Reader 
Real  Time  Clock  & 

121 

6655 

0 

0 

0 

200 

Interval  Timer 

121 

6655 

2 

0 

301 

42 

Lab 

Card  Reader 

121 

6655 

1 

1 

150 

200 

ALU 

121 

6655 

1 

1 

150 

175 

Modems 

121 

6655 

1 

14 

150 

- 

Power  Supply 

121 

6655 

5 

3 

751 

58 

TOTAL  SYSTEM 

121 

6655 

31 

37 

4658 

- 

TOTAL  SYSTEM 
(Without  Teletype 
and  Modems) 

121 

6655 

24 

19 

3606 

2706 

Mean  Time  Between  Failures 

277 

369 

Central  Processor  & 

Operator  Panel 

87 

3915 

2 

1 

511 

415 

Memory 

87 

3915 

10 

6 

2554 

1044 

Disk 

87 

3915 

3 

3 

766 

450 

Magnetic  Tape 

87 

3915 

4 

4 

1022 

750 

Printer 

87 

3915 

1 

3 

255 

750 

Teletype 

87 

3915 

2 

1 

511 

- 

Paper  Tape  Reader 
Real  Time  Clock  & 

87 

3915 

0 

1 

255 

750 

Field 

Interval  Timer 

87 

3915 

1 

0 

255 

69 

Test 

ALU 

42 

1890 

2 

1 

1058 

1416 

Digital  I/O 

42 

1890 

1 

0 

529 

782 

Power  Supply 

87 

3915 

5 

3 

1277 

276 

TOTAL  SYSTEM 

87 

3915 

31 

23 

8174 

- 

TOTAL  SYSTEM 
(Without  Teletype) 

87 

3915 

29 

22 

7663 

6252 

Mean  Time  Between  Failures 

130 

160 

Operating  hours  are  power-up  hours,  which  were  assumed  to  be  55  hours/week  for  Lab  System 
and  45  hours /week  for  Field  Test. 
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a)  A small  number  of  subsystems  caused  a large  number  of 
failures.  This  is  true  for  the  mag  tape  and  power 
supply  on  the  lab  system  and  memory,  tape  and  power 
supply  on  the  field  test. 

b)  There  were  as  many  introduced  failures  as  actual  com- 
ponent malfunctions.  Among  many  reasons  for  this  result 
are  absence  of  updates,  design  faults  and  numerous  inter- 
actions with  other  subsystems. 

While  the  operation  of  the  system  appears  within  tolerance,  there  is 
room  for  improvement  in  the  introduced  deficiencies  and  certain  key 
subsystems.  They  will  be  covered  in  Section  VI.  In  the  next  para- 
graphs, the  performance  of  each  individual  subsystem  will  be  re- 
viewed. 


SUBSYSTEM  RELIABILITY 

There  were  discrepancies  in  the  actual  and  expected  performance 
of  individual  subsystems.  All  comments  that  follow  are  based  on  the 
data  as  reflected  in  Table  IV. 

Central  Processor  and  Operator  Panel 

Performed  within  the  expected  range  in  both  lab  and  field  test 
systems . 

Memory 

While  the  lab  system  memory  performed  according  to  the  expected 
value,  the  memory  of  the  field  test  system  has  almost  two  and  a half 
times  its  share  of  expected  failures  plus  a large  number  of  intro- 
duced deficiencies.  Most  of  the  memory  failures  were  concentrated 
in  a 3-month  period  while  the  system  was  serviced  exclusively  by 
Data  General  personnel,  without  close  supervision  by  MITRE  personnel. 
The  absence  of  ECO  updates  or  the  incorrect  installation  of  some  of 
them,  resulted  in  a disproportionate  share  of  introduced  deficien- 
cies. A case  in  point  is  the  repair  to  a memory  board  for  a drop 
of  solder  dropped  in  a previous  repair. 

Disk 


While  the  lab  system  disk  performed  as  expected,  the  field-test 
disk  has  almost  double  the  number  of  failures.  It  should  be  pointed 
out,  however,  that  these  results  were  obtained  because  of  a single 


21 


major  failure  with  the  disk  motor,  so  no  conclusions  can  be  reached 
from  this  unique  incident. 

Mag  Tape 

The  performance  of  the  mag  tape  drive  for  both  the  field-test 
and  lab  system  was  below  expectations.  The  main  problems  were  arms 
collapsing  and  some  skewing  problems  (the  head  not  being  perpen- 
dicular to  the  tape  movement) . This  resulted  in  aborted  runs  and 
incompatibility  of  recordings  between  the  drives.  The  failure  of 
the  arms  could  not  be  diagnosed  for  a long  time  resulting  in  numerous 
failure  incidences  of  the  same  deficiency  which  was  a leak  in  a 
line  filter  capacitor.  The  problem  of  skewing  was  not  corrected 
for  a long  time  due  to  the  insistence  by  Data  General  personnel  that 
there  was  no  corresponding  adjustment;  it  was  not  mentioned  in  the 
relevant  literature  either. 

Not  included  in  the  survey  were  the  numerous  failures  in  the 
lab  system  of  the  AMPEX  tape  drive,  which  was  eventually  replaced 
by  the  WANG  drive. 

Printer 


Performed  within  the  expected  range  for  both  systems.  There 
was  a large  number  of  introduced  deficiencies  due  to  incorrect  paper. 

Teletype 


Failures  in  both  teletypes  were  numerous,  in  particular,  in  the 
lab  system,  which  required  minor  and  major  overhauling.  There  is  no 
accurate  comparison  data,  but  the  experience  of  many  Model  33  users 
has  been  less  than  satisfactory.  The  only  practical  alternative  is 
its  replacement  by  a more  reliable  unit,  as  was  done  in  the  lab 
system. 

Paper  Tape  Reader 

Performance  was  as  expected  in  both  systems.  There  was  only  a 
serious  failure  in  the  field  test  system. 

Readable  Real  Time  Clock  and  Interval  Timer 


Performance  was  less  than  expected  in  both  lab  and  field  test 
systems,  but  these  boards,  being  of  special  design  and  not  subject 
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to  the  quality  inspection  and  testing  of  an  off-the-shelf  product, 
may  have  some  minor  design  deficiencies.  In  any  case,  the  number 
of  failures  was  low  enough  to  avoid  reaching  a definitive  conclusion. 

Card  Reader 


Performance  was  as  expected. 

Asynchronous  Line  Units 

The  lab  system  ALU  performed  as  expected  and,  while  this  may 
surprise  DVM's  users,  so  did  the  field  test  ALU’s.  There  were 
several  failures  which  were  attributed  to  ALU  contacts,  but  the 
complexity  of  the  setup  for  the  experiment  in  which  these  failures 
occurred,  namely,  60  EIA  interface  cables  converging  to  the  same 
area,  made  it  difficult  to  trace  the  cause  of  the  failure.  It 
should  be  mentioned  that  ALU's  have  the  highest  expected  failure 
rate  and  additional  failures  are  to  be  expected. 

Synchronous  Line  Units 


There  was  no  extensive  experience  with  synchronous  communication 
so  as  to  include  those  boards  in  the  reliability  study.  It  should 
be  mentioned,  however,  that  the  Data  General  SLU's  had  some  design 
deficiencies  in  the  wrong  quiescent  voltage  for  the  data  lines  and 
the  improper  clearing  of  interrupts.  The  effect  of  the  latter 
deficiencies  has  not  yet  been  established.  All  tests  on  the  Digital 
Computer  Controls  SLU  were  unsuccessful. 

Digital  I/O 

Digital  I/O  performed  within  the  expected  range.  Some  failures 
were  attributed  to  improper  matings  of  the  contacts. 

Modems 


The  Teledynamics  modems  in  the  lab  system  never  quite  functioned 
satisfactorily.  There  were  numerous  introduced  deficiencies  due  to 
faulty  design.  As  in  the  case  of  the  teletype  there  is  not  a basis 
for  comparison. 

Power  Supplies 

There  were  from  five  to  ten  times  as  many  failures  as  expected 
from  those  units.  While  some  of  the  failures  can  be  attributed  to 
accidental  short  circuits  or  overloading  of  the  supplies,  there  was 
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more  than  a normal  share  of  component  malfunctions,  in  particular 
voltage  regulator  failures  and  numerous  blown  fuses  due  to  apparent 
voltage  spikes  in  AC  lines  which  the  supplies  cannot  presumably 
handle. 

Other  System  Failures 

In  spite  of  the  fact  that  they  were  not  counted  as  failures  in 
the  survey,  it  is  worth  mentioning  that  the  DVM’s,  the  lab  system 
in  particular,  prove  to  be  quite  susceptible  to  discharges  of  static 
electricity.  This  appears  to  be  a general  deficiency  of  NOVA  computers, 
as  corroborated  by  other  users.  The  problem  was  much  less  severe  for 
the  field  test  system  working  in  humidity-controlled  environments 
such  as  large  computer  rooms. 
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SECTION  V 


PROBLEM  AREAS  AND  CORRECTIVE  ACTIONS 


GENERAL 

As  mentioned  in  previous  sections,  there  were  basically  two 
areas  of  deficiency: 

a)  A large  number  of  introduced  deficiencies 

b)  A large  number  of  failures  in  a small  number  of  subsystems, 
namely,  memory,  tape  and  power  supplies. 

These  problem  areas  and  the  potential  corrective  action  will  be 
described  in  detail  in  the  next  paragraphs.  It  should  be  pointed 
out  that  most  of  the  recommended  corrective  actions  are  already 
being  implemented  with  good  results. 


INTRODUCED  DEFICIENCIES  PROBLEM 

Without  even  considering  the  numerous  initial  design  deficien- 
cies in  many  components  (as  serious  as  missing  connections,  incorrect 
etching  of  the  boards,  etc.)  there  were  almost  30  failures  per 
system  due  to  the  following  basic  causes: 

a)  Use  of  the  Lab  System  for  Hardware  Testing:  When  adding  a 

new  device  or  board,  there  was  no  way  of  debugging  or 
testing  it  other  than  its  connection  to  the  lab  system. 

If  there  was  failure  of  the  board,  it  resulted  in  a system 
failure,  power  supply  overload  etc.  In  addition,  the 
requirement  for  new  backboard  wiring  for  new  devices  and 
measurements  using  backboard  pins  was  an  always  present 
potential  source  of  deficiencies. 

b)  Incorporation  of  ECO’s:  Every  so  often,  Data  General 

issues  some  ECO’s  to  correct  detected  design  deficiencies. 

If  those  ECO’s  are  not  incorporated,  or  worse  yet,  partially 
incorporated,  the  board  eventually  fails.  On  the  other  hand, 
incorporating  those  ECO’s  in  the  field,  under  less  than 
ideal  conditions  is  in  itself  a good  source  of  introduced 
deficiencies.  A case  in  point  is  the  memory  failures  in 

the  field  test  system  during  the  experiments  at  NSA  from 
August  to  December,  1974. 
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c)  Use  of  New  Development  Devices:  The  special  requirements 

of  the  RTE  necessitated  the  use  of  specially  designed 
equipment  or  the  first  "of f-the-line"  models  of  standard 
equipment.  This  is  reflected  in  the  fact  that  many  serial 
numbers  of  DVM  boards  were  very  low  indicating  first  pro- 
duction models.  For  these  very  reasons,  those  components 
were  not  thoroughly  checked  and  debugged  by  the  manu- 
facturer upon  delivery  and  they  resulted  in  a large  number 
of  initial  deficiencies.  Correcting  those  deficiencies 
was  again  in  itself  a source  of  new  malfunctions. 

Corrective  Actions 


Based  on  the  causes  of  failures  as  mentioned  before,  there  are 
a number  of  corrective  actions  that  could  be  taken  to  minimize  the 
introduction  of  new  deficiencies.  Many  of  these  actions  have  been 
incorporated  into  the  normal  maintenance  procedures  to  produce  good 
results.  They  are  as  follows: 


a)  Timely  incorporation  of  ECO's:  Proper  notification  of 

the  existence  of  ECO's  must  be  obtained  from  Data  General 
and,  if  possible,  the  boards  should  be  sent  to  the  factory 
for  incorporation  of  any  ECO  other  than  very  minor  ones; 
it  should  never  be  done  in  the  field.  This  has  the  following 
advantages : 


1.  The  actual  soldering  and  connections  are  per- 
formed in  a better  environment,  less  conducive 
to  the  introduction  of  new  deficiencies. 

2.  The  ECO  is  tested  in  a system  other  than  the  DVM's. 

b)  Minimize  use  of  DVM  for  Hardware  Testing:  The  reduction 

in  the  introduction  of  new  specialized  devices  in  the  Data 
Handling  Laboratory  has  resulted  in  a decrease  in  the 
number  of  introduced  deficiencies. 


In  general,  it  can  be  said  that  as  in  maintaining  every  new 
system,  there  is  a learning  curve  for  the  proper  maintenance  pro- 
cedures. That  learning  cannot  be  completed  until  the  configuration 
remains  stationary  for  a while.  This  being  the  case  at  the  present 
time,  it  is  anticipated  that  extensive  preventive  measures  outside 
the  ones  mentioned  above  will  not  be  required. 
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COMPONENT  MALFUNCTIONS 


As  seen  in  Section  IV,  the  main  areas  of  deficiency  for  com- 
ponent malfunctions  were  magnetic  tape,  power  supplies  and  memory. 

A potential  area  of  deficiency  is  the  asynchronous  line  units.  Each 
will  be  discussed  individually. 

Memory 

The  field  test  system  memory  had  more  memory  failures  than 
would  be  normally  expected.  Since  all  maintenance  of  those  memory 
boards  was  done  by  Data  General  personnel,  not  too  much  can  be  done 
to  improve  the  failure  rate  other  than  acquiring  a spare  board,  so 
a faulty  one  can  be  easily  replaced  and  it  does  not  impact  the 
availability  of  the  system.  It  should  be  a rule  that  ECO’s  are  not 
incorporated  in  the  field.  Checkerboard  tests  should  be  run  at 
fixed  intervals  (every  month  or  so)  rather  than  only  when  a failure 
is  suspected.  Runs  must  be  overnight  if  possible.  A point  worth 
noticing  is  that  the  expected  failure  rate  for  the  8K  and  4K  memory 
boards  is  about  the  same;  therefore,  systems  using  8K  boards  will 
have  half  the  predicted  memory  failure  rate  of  comparable  ones 
using  4K  boards. 

Magnetic  Tape  Drives 

The  collapsing  arm  failures  were  due  to  two  factors:  a leak 

in  a line  filter  capacitor  and  bad  contacts.  The  alignment  problem 
did  not  reappear  since  the  lower  roller  tension  guide  was  adjusted. 
Proper  maintenance  of  the  tape  should  include  marking  the  roller 
guides  adjustments  and  visually  inspecting  them  for  misadj ustments . 

The  test  routine  should  be  run  periodically,  as  well  as  a benchmark 
tape  for  periodic  alignment  checks. 

Power  Supplies 

The  regulator  needs  replacement  every  6 months  or  so.  However, 
if  a failure  does  not  occur,  which  is  normally  the  case  during  periods 
when  the  system  is  not  moved  nor  new  boards  being  debugged,  the  risk 
of  introducing  additional  deficiencies  by  this  replacement  makes  it 
not  advisable.  A spare  should  always  be  on  hand,  however. 

ALU 


The  expected  failure  rate  for  the  ALU’s  is  high.  If  64-line 
applications  are  going  to  be  extensively  used,  it  is  imperative  to 
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acquire  a spare  board.  An  alternative  approach  is  to  consider  the 
emulator  capacity  as  56  lines  with  8 spares.  This  is  normally  the 
case  in  communications  processors. 

Other  Failures 


Concerning  the  static  electricity  problem,  not  much  can  be  done 
except  to  improve  the  environment  by  adding  a humidifier  during 
winter  months.  This  is  a general  problem  for  all  NOVA's. 
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SECTION  VI 


FAILURE  DOCUMENTATION 


Proper  maintenance  requires  a learning  period  with  a stable 
system.  This  in  turn  requires  good  documentation  of  failures.  The 
documentation  should  include  the  source  of  the  deficiency,  either 
introduced  or  due  to  component  malfunctions. 

An  installation  failure  log  book  should  be  maintained  rather 
than  relying  on  the  installation  log  book,  since  there  are  large 
differences  in  the  extent  of  the  entries  by  different  users  and 
since  the  log  book  documents  symptoms  rather  than  failures.  This 
makes  it  difficult  to  correlate  them  with  a particular  deficiency. 

The  failure  logs  should  contain  all  the  failure  incidents, 
and  once  the  deficiency  is  detected,  even  if  not  eliminated,  it 
should  be  cross  correlated  with  the  pertinent  failures.  For  all 
cases  where  Data  General  personnel  service  the  machine,  the  de- 
ficiencies should  be  documented  from  the  service  bills,  so  care 
should  be  taken  that  those  contain  adequate  details. 
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APPENDIX  I 


FAILURE  SURVEY  DATA 


Sources  for  this  survey  included  the  Mini-Computer  Facility 
report,  Data  General  Field  Service  bills  and  Air  Force  reports. 

The  survey  results  were  also  correlated  with  the  installation  log 
book  for  a one  month  period  randomly  chosen,  for  the  purpose  of 
determining  if  it  included  failures  not  reported  elsewhere.  No 
additional  failures  were  found. 

Each  failure  is  classified  as:  (a)  Due  to  component  malfunctions 

(b)  Due  to  introduced  deficiency,  or  (c)  Not  accountable.  Based  on 
the  definitions  in  Section  I,  the  following  criteria  were  used  in 
classifying  the  failures: 

1)  If  a failure  recurred  a short  period  after  the  deficiency 
was  presumably  corrected,  it  was  counted  as  a single 
failure  (i.e.,  the  second  failure  was  classified  not 
accountable) . 

2)  If  a failure  could  be  traced  to  an  initial  deficiency,  but 
occurred  much  later  than  the  original  installation,  it  was 
classified  as  an  introduced  deficiency. 

3)  If  there  is  a common  failure  to  more  than  one  subsystem, 
the  failure  was  accounted  for  each  one  of  them. 

4)  The  distinction  between  the  causes  of  the  failure  (component 
malfunction  or  introduced  deficiency)  was  determined  based 
on  the  failure  description.  If  the  failure  is  minor  (e.g., 
fuse  blown)  it  was  classified  as  not  accountable. 
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CENTRAL  PROCESSOR  UNIT  FAILURES  (INCLUDING  OPERATOR  PANELS) 
Lab  System 


Date 


1.  10/16/72 

2.  6/6/73 

3.  2/20/74 

4.  2/22/74 


Failure  Description 


Bit  4 address  register  light 
failure.  Bulb  replaced. 

CPU  board  interchange  caused  system 


Common  failure  to  CPU,  Core  Memory 
and  Power  Supply.  (System  down 
2 days) . 

Failure  recurrence.  Failures  to 
CPU1 , CPU2  and  delay  lines. 

(System  down  6 days). 

Total  number  of  failures 


Field-Test  System 
Date 


1.  6/6/73 

2.  9/24/74 


Failure  Description 


Timing  inconsistency  on  CPU.  ECO 
installed  incorrectly. 

CPU  failed  to  halt.  Broken  etch 
repaired . 


3.  10/10/74  Could  not  load  system.  Found  AC2 

bad.  Defective  chip  replaced. 

4.  10/16/74  Could  not  load  system.  Replaced 

two  chips  in  CPUl. 


Total  number  of  failures 


Failure  Type 

Due  To 

Due  To 

Component 

Introduced 

Not 

Malfunction 

Deficiency 

Applicable 

1 

- 

- 

i 

1 

- 

1 

- 

- 

- 

- 

I 

2 

1 

1 

Failure  Type 

Due  To 

Due  To 

Component 

Introduced 

Not 

Malfunction  Deficiency 

Applicable 

- 

1 

- 

1 

- 

- 

1 

- 

- 

- 

- 

1 

s 2 

1 

1 
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MEMORY 


Lab  System 

Note:  Individual  boards  could  not  be  identified  from  failure  data. 


Date  Failure  Description 


Failure  Type 


Due  To  Due  To 

Component  Introduced  Not 
Malfunction  Deficiency  Applicable 

1.  2/20/74  Common  failure  to  CPU,  memory  and 

peripherals.  Several  components  in 
two  memory  boards  were  replaced. 

(System  down  2 days) . 2 


2. 

2/22/74 

Failure  recurrence. 

4 memory  boards 

defective . 

(System 

down  6 days) . 

- 

3. 

9/30/74 

Core  stack 

failed. 

Repaired  by  Data 

General . 

(System  down  7 days) 

1 

1 


4.  8/74- 

10/74  Ten  failures  of  8103/3800  and  8103/ 

1461.  Failures  were  traced  to  a 
design  incorporating  2 different  chips 
with  the  same  value  for  a compensating 

capacitor.  - 10 


5.  11/12/74  8K  core  failing.  Change  sense 

amplifier.  1 - 

6.  11/13/74  8K  core  intermittently  failing.  Changed 

another  sense  amplifier.  - 1 


Total  number  of  failures  4 10  2 


Field-Test  System 


Date 

Board 

Failure  Description 

Failure  Type 

5/74 

Unidentified  memory  failure 
at  Data  Service  Center. 

Due  To 
Component 
Malfunction 

Due  To 
Introduced 
Deficiency 

Not 

Applicable 

1 

7/25/74 

10 

Dropping  random  bits,  random 
locations.  Replaced  3 
inhibit  drivers. 

1 

_ 

_ 

7/26/74 

10 

Adjusted  read  strobe  delay. 
Found  address  7101  inopera- 
tive . 

1 
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MEMORY  (CONTINUED) 


Date 

Board 

Failure  Description 

Failure  Type 

Due  To 

Due  To 

Component 

Introduced 

Not 

Malfunction 

Deficiency 

Applicable 

4. 

7/26/74 

363 

Dropping  bit  2.  Repaired  OC 

driver . 

1 

- 

- 

5. 

8/2/74 

320 

Dropping  bits.  Replaced 

transistor . 

1 

- 

- 

6. 

8/2/74 

1852 

Dropping  bits.  Replaced 

transistors . 

1 

- 

- 

7. 

8/5/74 

320 

Dropping  bit  9.  Replaced 

transistors  and  install  100 

pf  caps. 

- 

- 

1 

8. 

8/5/74 

363 

Dropping  bit  0,  location 

53021.  Replaced  transistors. 

1 

- 

- 

9. 

8/28/74 

10 

Replaced  broken  core.  Inter- 

mittent driver  causing  address 

errors.  Temperature  sensitive 

component  replaced. 

1 

- 

- 

10. 

9/13/74 

10 

Recurrent  failures.  Send  to 

office  for  repairs. 

- 

1 

- 

11. 

9/13/74 

363 

Dropping  bits.  Repaired  sense 

amplifier . 

1 

- 

- 

12. 

9/18/74 

260 

Picking  up  bits  1 and  14.  Re- 

placed transistors. 

1 

- 

- 

13. 

9/18/74 

363 

Picking  up  bits.  Adjusted 

strobe . 

- 

1 

- 

14. 

9/19/74 

363 

Picking  up  bits.  Replaced 

sense  amplifier,  capacitor 
and  resistors.  Installed  ECOs 
but  failure  persisted.  Re- 
soldered loose  connections.  - 1 


15. 

10/16/74 

1428 

Picking  up  bits.  Failure 
could  not  be  repeated. 

16. 

12/5/74 

10 

Failing.  Replaced  sense 
amplifier  and  other  com- 

ponents . 
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MEMORY  (CONTINUED) 


Date  Board  Failure  Description 


Failure  Type 


Due  To  Due  To 

Component  Introduced  Not 
Mai  function  Deficiency  Applicable 


17.  12/9/74 

260 

Failing.  Replaced  Transistor. 

1 

- 

- 

18.  12/10/74 

320 

Failing.  Replaced  Transistor. 

1 

- 

- 

19.  12/11/74 

363 

Picking  up  bit  2.  Replaced 
sense  amplifier. 

- 

1 

- 

20.  12/27/74 

320 

Picking  up  bits.  Replaced 
several  burned  components. 
Found  solder  connection  to 
ground . 

1 

21.  12/30/74 

10 

Intermittent  failures. 
Replaced  sense  amplifier. 

1 

Total  number  of  failures  10  6 5 
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DISK 


Lab  System 
Date 

1.  9/25/72 

Field-Test  I 
Date 

1.  5/2/73 

2.  6/25/73 

3.  6/29/73 

4.  2/13/74 

5.  2/16/74 

6 . 2/23/74 

7 . 3/7/74 


Failure  Description 


Failure  Type 


Inoperative.  Clocking  timing 
pulses  out  of  sync.  Data  General 
rewrote  clock  timing  marks.  Two 
days  downtime. 


Total  number  of  failures 


stem 


Failure  Description 


Intermittent  failures.  Bad 
connection  in  power  supply. 

System  would  not  load  from 
disk.  Electrolytic  capacitor 
improperly  installed. 

System  would  not  load  from  disk. 
Sloppy  factory  resoldering 
repaired. 

Excessive  noise  in  disk.  New 
drive  motor  ordered. 

Drive  motor  failure.  New  one 
installed.  (System  down  for 
6 days) . 

Starting  relay  switch  faulty. 

New  one  installed. 

Pronounced  squeak.  Cooling 
fan  slipped  on  spindle  of 
disk  motor.  Refastened  on 
3/11/74. 


Due  To 
Component 
Malfunction 

1 

Due  To 

Introduced  Not 

Deficiency  Applicable 

es  1 

- 

- 

Failure  Type 

Due  To 

Due  To 

Component 

Introduced 

Not 

Malfunction  Deficiency 

Applicable 

- 

1 

- 

- 

1 

- 

- 

1 

- 

- 

- 

1 

1 

- 

- 

1 

- 

- 

1 

- 

- 

Total  number  of  failures 
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MAGNETIC  TAPE 


Lab  System 


Date 

Failure  Description 

Failure  Type 

Due  To 
Component 
Malfunction 

Due  To 
Introduced 
Deficiency 

Not 

Applicable 

1. 

9/72  - 
1/73 

Numerous  serious  failures  of  Ampex  Tape 
Drive  led  to  its  replacement  by  a Wang 
drive  on  1/9/73. 

1 

2. 

5/4/73 

Parity  errors  while  writing.  Loose 
connection  found. 

X 

_ 

_ 

3. 

6/13/73 

Parity  errors.  CPU  to  Tape  Adapter 
Unit  connector  loose. 

_ 

1 

_ 

4. 

6/25/73 

Not  operational.  Defective  fuse 
replaced . 

1 

_ 

5. 

9/13/73- 

10/13/73 

Tape  drive  dropping  out  of  system. 
Service  call. 

1 

6. 

11/28/73 

Tape  would  not  power  up.  Fuse  blown. 

1 

- 

- 

7. 

2/22/74 

Common  failure  to  CPU,  memory  and  tape 
controller.  (System  down  6 days.) 

1 

- 

_ 

8. 

9/25/74 

Arms  collapsed.  Cleaned  connectors 

- 

1 

- 

9. 

11/5/74 

Unable  to  laod  DOS.  Picking  up  bit 
9.  Replaced  sense  amplifier. 

1 

- 

- 

Total  number  of  failures 

Field-Test  System 

6 

2 

1 

Date 

Failure  Description 

Failure  Type 

Date 

Failure  Description 

Due  To 
Component 
Malfunction 

Due  To 

Introduced  Not 

Deficiency  Applicable 

1. 

8/29/73 

Tape  drive  dropping  out  of  system. 
Failure  did  not  recur. 

- 

1 

- 

2. 

4/74 

Tape  rewind  in  middle  of  operation. 
Possible  loose  connection. 

- 

1 

- 

3. 

6/74 

Arms  collapsing.  Head  alignment  problems.  1 

- 

- 
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MAGNETIC  TAPE  (CONTINUED) 


Date 

Failure  Description 

Failure  Type 

Due  To 

Due  To 

Component 

Introduced 

Not 

Malfunction 

Deficiency 

Applicable 

7/25/74 

Arms  collapsing.  Line  filter 
capacitor  leaked.  Also  adjusted 
read  amplifier  quiescent  voltage. 

1 

8/2/74 

Parity  errors.  Moved  cables  and 
problem  did  not  reappear. 

- 

1 

- 

8/8/74 

Continuing  tape  alignment  problems. 
Found  lower  roller  tension  guide 
off.  Adjusted. 

1 

. 

10/7/74 

Tape  rewind  during  write.  Could  not 
repeat  failure.  Replaced  chip. 

- 

1 

- 

12/11/74 

Sporadic  failures  of  arms  collapsing. 
Adjusted  tension  rolls  and  dynamic 
and  electrical  skew. 

1 

4 4 0 


Total  number  of  failures 


PRINTER 


Lab  System 

Date 

Failure  Description 

Failure  Type 

Due  To 
Component 
Malfunction 

Due  To 
Introduced 
Deficiency 

Not 

Applicable 

1. 

9/14/72 

Blower  fan  inoperative.  Replaced 
on  warranty. 

- 

1 

_ 

2. 

9/14/72 

Power  supply  inoperative.  Board 
replaced  on  warranty. 

- 

1 

3. 

9/11/73 

Fuse  blown 

- 

- 

1 

4. 

12/28/73 

Printer  dropping  column  80.  Print 
hammer  and  switch  replaced. 

1 

- 

5. 

9/16/74 

Ribbon  jammed.  Replaced  with 
thicker  ribbon  . 

- 

1 

6. 

10/3/74 

Incorrect  printout.  Adjusted 
drum  mechanism. 

1 

- 

- 

Total  number  of  failures  2 

Field-Test  Svstem 

3 

1 

Date 

Failure  Description 

Failure  Type 

1. 

7/3/73 

Printer  would  not  accept  data  from 
CPU.  Timing  problem  corrected. 

Due  To  Due  To 

Component  Introduced 

Malfunction  Deficiency 

1 

Not 

Applicable 

2. 

7/9/73 

New  timing  problem  not  compensated 
by  the  previous  fix. 

- 

- 

1 

3. 

8/17/73 

Printer  intermittently  dropping  out 
of  system.  Out  of  tolerance  con- 
ditions were  adjusted. 

_ 

1 

. 

4. 

8/27/73 

Continuing  intermittent  failures. 
Two  new  control  boards  were 
installed. 

1 

5. 

12/28/73 

Printer  would  not  power  up.  Fuse 
Blown. 

- 

- 

1 

6. 

4/74 

Ribbon  jams.  Improper  ribbon  was 
used . 

- 

1 

- 

Total  number  of  failures  1 

3 

2 
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TELETYPE 


Lab  System  Failure  Description  Failure  Type 

Date 

Due  To  Due  To 

Component  Introduced  Not 

Malfunction  Deficiency  Applicable 

1.  12/21/72  Erroneous  character  transmitted. 

Adjusted  home  position  of 


commutator  brush. 

- 

1 

- 

2. 

1/15/73 

Erroneous  characters.  Commutator 

brush  thoroughly  cleaned. 

1 

- 

- 

3. 

2/15/73 

TTY  would  not  echo  properly.  Film 

of  oil  on  keyboard  contacts. 

Contacts  cleaned. 

- 

1 

- 

4. 

2/23/73 

TTY  would  not  echo  properly.  Film 

of  oil  on  contacts.  TTY  returned 

to  Teletype  Corp.  for  degreasing 

and  recalibration. 

- 

1 

- 

5. 

3/12/73 

TTY  returned.  Intermittent  line 

feed  problem.  Corrected  under 

warranty  for  factory  service. 

1 

- 

- 

6. 

5/4/73 

Excessive  noise.  Fault  would  not 

repeat . 

- 

1 

- 

7. 

6/6/73 

Not  operational.  Partial  overhaul 

required . 

1 

- 

- 

8. 

7/10/73 

Power  supply  failure.  Fuse  replaced. 

- 

- 

1 

9. 

8/9/73 

TTY  would  not  power  up.  Fuse 

replaced . 

- 

- 

1 

10. 

8/27/73 

TTY  would  not  power  up.  Fuse 

replaced . 

- 

- 

1 

11. 

9/7/73 

TTY  would  not  power  up.  Fuse 

replaced . 

- 

- 

1 

12. 

9/19/73 

Continuous  character  error.  Minor 

overhaul  performed. 

1 

- 

- 

13. 

10/11/73 

TTY  would  not  power  up.  Fuse 

replaced . 

- 

- 

1 

14. 

9/27/74 

Power  switch  failure.  Replaced 

switch . 

1 

- 

- 

Total  number  of  failures 

5 

4 

5 
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TELETYPE  (CONTINUED) 


Field-Test 

Date 

1.  7/25/74 

2.  7/27/74 

3.  8/74 


ysteni 


Failure  Description 

Failure  Type 

Due  To 
Component 
Malfunction 

Due  To 
Introduced 
Deficiency 

Not 

Applicable 

TTY  running  independently  of  CPU 
control.  Replaced  relay. 

1 

_ 

- 

Not  switching  properly.  Replaced 
local-remote  switch. 

1 

- 

- 

Erroneous  characters.  Adjusted 
commutator  brush. 

- 

1 

- 

Total  number  of  failures 

2 

1 

0 
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PAPER  TAPE  READER 


Lab  System 


No  Failures 


Field-Test  System 


Date  Failure  Description 

Failure  Type 

Due  To 

Due  To 

Component 

Introduced 

Not 

Malfunction 

Deficiency 

Applicable 

1.  7/25/74  Oil  leak  in  capacitor  caused 

corrosion.  Replaced  capacitor 

and  diode.  (Reader  in  repairs 

for  2 weeks . ) 

1 

— 

— 

Total  number  of  failures 

1 

- 

- 
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REAL-TIME  CLOCK  AND  INTERVAL  TIMER 
Lab  System 


Date 

Failure  Description 

Failure  Type 

Due  To  Due  To 

Component  Introduced 

Malfunction  Deficiency 

Not 

Applicable 

1.  10/18/74 

Inoperative.  Cold  solder  repaired. 

1 

- 

- 

2.  9/24/74 

Failure.  Replaced  chip. 

1 

- 

- 

Total  number  of  failures 

2 

- 

- 

Field-Test  System 

Date 

Failure  Description 

Failure  Type 

1.  5/30/73 

Erroneous  readings  in  interval 
timer.  Binary  counter  chip 
replaced . 

Due  To 
Component 
Malfunction 

1 

Due  To 
Introduced 
Deficiency 

Not 

Applicable 

Total  number  of  failures 

1 

- 

- 
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CARD  READER 


Lab  System 

Date  Failure  Description  Failure  Type 

Due  To  Due  To 
Component  Introduced  Not 

Malfunction  Deficiency  Applicable 

1.  9/27/72  Motion  errors.  Photo  electric  cell 

lenses  cleaned.  - 1 

2.  8/23/73  Incorrect  reading.  Service  call.  1 - 


Total  number  of  failures  1 1 
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ASYNCHRONOUS  LINE  UNITS 


Lab 

System 

Failure  Type 

Date 

Failure  Description 

Due  To 
Component 
Malfunction 

Due  To 
Introduced 
Deficiency 

Not 

Applicable 

1. 

10/26/72 

Incorrect  carrier  detect  signals. 
Defective  transistor  replaced. 

1 

- 

- 

2. 

7/23/73 

Solder  bridge  on  address  straps  did 
not  allow  loading  of  system. 

- 

1 

- 

Total  number  of  failures  1 

1 

- 

Field-Test  System 

Date 

Failure  Description 

Failure  Type 

Due  To 
Component 
Malfunction 

Due  To 
Introduced 
Deficiency 

Not 

Applicable 

1. 

2/12/74 

Short  on  ALU  board  causing  spurious 
interrupts . 

- 

1 

- 

2. 

4/74 

Numerous  failures.  Bad  contacts. 

1 

- 

- 

3. 

4/5/74 

Board  failure.  Bad  chip  replaced. 

1 

- 

- 

Total  number  of  failures 


2 


1 


MODEMS  AND  DIGITAL  I/O 


Lab  System 
Date 


1.  4/16/73 

2.  4/17/73 

3.  5/14/73 

4.  6/15/73 

5.  7/23- 
12/74 


Failure  Description 


Failure  Type 

Due  To 

Due  To 

‘ 

Component 

Introduced 

Not 

Malfunction 

Deficiency 

Applicable 

Continuous  malfunctions  on  Tele- 
dynamics modems  including  carrier 

detect  problems.  1 1 

Power  supply  failures.  Additional 

power  supply  installed.  - 1 

Continuing  failures.  Seven  modems 

to  be  replaced  by  Teledynamics.  - 7 

4 additional  modems  sent  to  Tele- 
dynamics. - 4 

Continuous  minor  failures.  Rest 

of  modems  sent  to  Teledynamics.  - 1 


Total  number  of  failures  1 14 


Field-Test  System 


Date  Failure  Description  Failure  Type 


Due  To 

Due  To 

Component 

Introduced 

Not 

1.  3/18/74- 

Malfunction 

Deficiency 

Applicable 

3/30/74 

Lines  not  logging  on. 

Digital 

I/O  boards  not  seated 

properly . 

1 

- 

- 

Total  number  of  failures  1 
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POWER  SUPPLIES 


Lab  System 


Date 

Failure  Description 

Failure  Type 

Due 

To 

Due  To 

Component 

Introduced 

Not 

1. 

7/16/73 

Malfunction 

CPU  would  not  power  up.  Fuse 

Deficiency 

Applicable 

replaced . 

- 

- 

1 

2. 

9/19/73 

CPU  would  not  power  up.  Faulty 
diode  bridge  replaced. 

1 

- 

- 

3. 

11/20/73 

Expansion  chassis  power  failure. 
Service  call. 

1 

- 

- 

4. 

2/20/11* 

Common  failure  to  CPU,  memory  and 
power  supply.  (System  down  2 days) 

1 

- 

- 

5. 

2/22/74 

Failure  recurrence.  Power  supplies 
affected.  (System  down  6 days) 

- 

- 

1 

6. 

9/23/74 

Power  supply  (+5V)  failure.  Re- 
placed regulator  and  fuse. 

1 

- 

- 

7. 

10/30/74 

Repaired  +15V  supply.  Failure 
caused  by  connecting  DCC  SLUs  to 
expansion  chassis. 

_ 

1 

8. 

11/12/74 

Dual  +5V  power  failure.  Replaced 
voltage  regulator. 

1 

- 

- 

Total  number  of  failures 

5 

1 

2 

Field-Test  System 

Date 

Failure  Description 

Failure  Type 

Due 

To 

Due  To 

Component 

Introduced 

Not 

Malfunction 

Deficiency 

Applicable 

1. 

5/15/73 

System  would  not  power  up.  Power 
supply  fuse  holder  assembly 
replaced . 

1 

- 

- 

2. 

11/12/73 

Two  power  supplies  for  the  CPU  had 
failed.  Taken  to  factory  to  repair. 
(System  down  for  4 days) . 

1 

_ 

_ 

3. 

11/19/73 

Power  supply  failure.  Several  broken 
etches  on  a power  supply  repaired. 

1 
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POWER  SUPPLIES  (CONTINUED) 
Field  Test  System 


Date 


Failure  Description 


Failure  Type 


Due  To  Due  To 

Component  Introduced  Not 
Malfunction  Deficiency  Applicable 

4.  11/23/73  Power  supply  failure.  Break  in 

insulation  caused  a short.  (System 

down  for  6 days  for  spare  parts) . 1 - 


5. 

12/14/73 

Power  failure.  Fuse  blown. 

6. 

2/74  - 
5/74 

15  amp  fuse  blown  periodically. 
Possible  voltage  spikes. 

7. 

5/74  - 
7/74 

Numerous  fuse  failures, 
incorrect  line  voltages. 

Poss ible 

8. 

7/30/74 

Failure.  Replaced  +15V 
regulator . 

voltage 

9. 

9/18/74 

Failure.  Bad  ripple  on 
power  supply.  Replaced 

Dual  +5V 
regulator. 

10. 

12/9/74 

Several  system  failures, 

, Replaced 

defective  transistors  in  power  supply.  - 1 


Total  number  of  failures  5 32 
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APPENDIX  II 

PREDICTED  FAILURE  RATE  CALCULATIONS 
FOR  EACH  SUBSYSTEM 
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Central  Processor  Predicted  Failure  Rate 


[Failures  per  million  hours] 

$ 

Part  Description 

Fixed 

Ground 

Lab 

System 

Field  Test 
System 

163 

Integrated  Circuits 

30.97 

30.97 

30.97 

84 

Capacitors,  Ceramic 

.30 

.99 

2.48 

24 

Capacitors,  Tantalum 

.58 

1.15 

3.84 

138 

Resistors,  Carbon 

2.90 

3.45 

3.86 

1 

Transformer 

.30 

.50 

1.00 

*800 

Solder  Connections 

15.96 

15.96 

15.96 

4 

Connectors,  100  Pin 

52.80 

96.00 

192.00 

1 

Oscillator  Crystal 

.02 

.02 

.02 

TOTAL 

104.03 

149.04 

250.13 
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Operator  Console  Failure  Rate : 


[Failures  per  million  hours] 


qty 

Part  Description 

Fixed 

Ground 

Lab 

System 

Field  Test 
System 

20 

Integrated  Circuits 

3.80 

3.80 

3.80 

65 

Resistors,  Carbon 

1.37 

1.63 

1.82 

1 

Capacitor,  Tantalum 

.02 

.05 

.16 

150 

Solder  Connections 

.86 

.86 

.86 

24 

Switches 

6.00 

24.00 

84.00 

37 

Lamps 

18.50 

37.00 

74.00 

TOTAL 

30.55 

1 67.34 

164.64 
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4K  Memory  Module  Failure  Rate : 


[Failures  per  million  hours] 


Qty 

Part  Description 

Fixed 

Ground 

Lab 

System 

Field  Test 

80 

Integrated  Circuits 

15.20 

15.20 

15.20 

90 

Diodes,  Signal 

.26 

.34 

.51 

16 

Transistors  NPN,  Signal 

3.36 

6.72 

13.44 

102 

Capacitors,  Ceramic 

.60 

1.20 

3.01 

32 

Capacitors,  Tantalum 

.77 

1.54 

5.12 

209 

Resistors,  Carbon  I 

4.39 

5.23 

5.85 

2000 

Soldered  Connections 

11.40 

11.40 

11.40 

2 

Connectors,  5 Pin 

5.50 

10.00 

20.00 

2 

Connectors  - 100  Pin 

26.40 

48.00 

96.00 

4 

Thermistors 

1.20 

1.20 

1.20 

d4000 

Ferrite  Cores 

2.56 

2.56 

2.56 

TOTAL 

71.64 

103.39 

174.29 

4 


8K  Memory  Module  Failure  Rate : 


[Failures  per  million  hours] 


Qty.  Part  Description 

Fixed 

Ground 

Lab 

System 

Field  Test 

100  Integrated  Circuits 

19.00 

19.00 

19.00 

100  Diodes,  Signal 

.29 

.38 

.57 

16  Transistors  NPN , Signal 

3.36 

6.72 

13.44 

120  Capacitors,  Ceramic 

.71 

1.42 

3.54 

32  Capacitors,  Tantalum 

.77 

1.54 

5.12 

230  Resistors,  Carbon 

4.83 

5.75 

6.44 

>200  Soldered  Connections 

12.54 

12.54 

12.54 

2 Connectors  5 Pin 

5.50 

10.00 

20.00 

2 Connectors  100  Pin 

26.40 

48.00 

96.00 

4 Thermistors 

1.20 

1.20 

1.20 

>8000  Ferrite  Cores 

5.12 

5.12 

5.12 

TOTAL 

79.72 

111.67 

182.97 
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Real  Time  Clock  & Interval  Timer  Failure  Rate: 


[Failures  per  million  hours] 


Qty 

Part  Description 

Lab  System 

Field  Test 

55 

Integrated  Circuits 

10.45 

10.45 

10 

Resis  tors 

.25 

.28 

6 

Capacitors 

.04 

.60 

1 

Transistor,  Power 

2.50 

5.00 

1 

Oscillator  Crystal 

.02 

.02 

m 

Solder  Connections 

4.56 

4.56 

o 

o 

Wire  wrap  Connections 

0.01 

0.01 

1 

Connector  100  Pin 

24.00 

48.00 

TOTAL 

41.83 

68.92 
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DCC  Asynchronous  Line  Units  Failure  Rate: 


[Failure^  per  million  hours] 

Qty.  Part  Description 

Field  Test 

100  Integrated  Circuits 

19.00 

120  Capacitors 

12.00 

3 Transistors,  Power 

21.00* 

10  Transistors,  Signal 

12.00 

8 LSI  Chips 

1.52 

150  Resistors 

4.20 

2000  Solder  Connection 

11.40 

2 Connectors,  100  Pin 

96.00 

1 Oscillator  Crystal 

.02 

TOTAL 

177.14 

* Average  value  for  PNP  and  NPN  power  transistors 
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Data  General  Synchronous  Line  Adapters  Failure  Rate: 


[Failures  per  million  hours  1 


Qty 

Part  Description 

Field  Test 

100 

Integrated  Circuits 

19.00 

50 

Resistors 

1.40 

70 

Capacitors 

7.00 

2000 

Soldered  Connections 

11.40 

2 

Connectors,  100  Pin 

96.00 

1 

Oscillator  Crystal 

.02 

TOTAL 

134.82 

Digital  I /O  Fail ure  Rate : 

a)  I/O  Boards  [Failures  per  million  hours] 


Qty 

. Part  Description 

Field  Test 

35 

Integrated  Circuits 

6.65 

15 

Capacitors 

1.50 

20 

Resistors 

.56 

500 

Solder  Connections 

2.85 

1 

Connector  100  Pin 

48.00 

TOTAL 

59.56 

b)  Terminator 


Qty  Part  Description 

Field  Test 

10  Integrated  Circuits 

1.90 

150  Resistors 

4.20 

50  Capacitors 

5.00 

500  Solder  Connections 

2.85 

1 Connector  100  Pins 

48.00 

TOTAL 

61.95 
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Power  Supply  Failure  Rate : 


[Failures  per  million  hours] 


Qty.  Part  Description 

Fixed 

Ground 

Lab 

Sy_s  t em 

Field  Test 

8 Integrated  Circuits 

1.52 

1.52 

1.52 

6 Diodes,  Power 

5.40 

16.20 

54.00 

4 Diodes,  Zener 

3.08 

4.62 

7.70 

4 Transistors,  NPN,  Power 

3.28 

8.20 

19.68 

5 Transistor,  PNP,  Power 

6.70 

16.75 

t 40.20 

3 Capacitors,  Electrolytic 

.13 

.25 

.50 

12  Capacitors,  Tantalum 

.29 

.58 

1.92 

87  Resistors,  Carbon 

1.83 

2.18 

2.44 

2 Resistors,  Wire 

| 

.38 

[ 

.76 

1.52 

3 Transformers  & Inductors 

.90 

1.50 

3.00 

150  Solder  Connections 

.86 

.86 

.86 

2 Fuses 

.20 

.20 

.20 

1 Circuit  Breaker 

.50 

.50 

.50 

1 Fan 

4.00 

4.00 

4.00 

TOTAL 

29.07 

58.12 

138.04 
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* 


Other  Subsystems : 


The  failure  rate  for  peripherals  devices  was  estimated  as  follows; 
based  on  experience  with  similar  equipment  complexity: 


[Failures  per  million  hours] 

Subsystem 

Lab  System 

Field  Test 

Controller 

Device 

Total 

Controller 

Device 

Total 

Disk 

100 

200 

300* 

150 

300 

450 

Tape 

100 

400 

500* 

150 

600 

750 

DG  ALU 

120 

- 

120 

- 

- 

- 

Paper  Tape 

50 

150 

200 

75 

225 

300 

Card  Reader 

50 

150 

200 

- 

- 

- 

Printer 

100 

400 

500 

150 

600 

750 

* According  to  TACC  Automation  reliability  report. 

Failure  rates  for  the  modems  and  teletype  were  not  estimated  because  of 
lack  cf  information. 
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